Automatic Threshold Estimation for Data Matching Applications

نویسندگان

  • Juliana Bonato dos Santos
  • Carlos Alberto Heuser
  • Viviane Pereira Moreira
  • Leandro Krug Wives
چکیده

Several advanced data management applications, such as data integration, data deduplication or similarity querying rely on the application of similarity functions. A similarity function requires the definition of a threshold value in order to assess if two different data instances match, i.e., if they represent the same real world object. In this context, the threshold definition is a central problem. In this paper, we propose a method for the estimation of the quality of a similarity function. Quality is measured in terms of recall and precision calculated at several different thresholds. On the basis of the results of the proposed estimation process, and taking into account the requirements of a specific application, a user is able to choose a threshold value that is adequate for the application. The proposed estimation process is based on a clustering phase performed on a sample taken from a data collection and requires no human intervention.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Structural Matching Method Based on Linear Features for High Resolution Satellite Images

  Along with commercial accessibility of high resolution satellite images in recent decades, the issue of extracting accurate 3D spatial information in many fields became the centre of attention and applications related to photogrammetry and remote sensing has increased. To extract such information, the images should be geo-referenced. The procedure of georeferencing is done in four main steps...

متن کامل

Performance Evaluation of Local Detectors in the Presence of Noise for Multi-Sensor Remote Sensing Image Matching

Automatic, efficient, accurate, and stable image matching is one of the most critical issues in remote sensing, photogrammetry, and machine vision. In recent decades, various algorithms have been proposed based on the feature-based framework, which concentrates on detecting and describing local features. Understanding the characteristics of different matching algorithms in various applications ...

متن کامل

Learning Threshold Parameters for Event Classi cation in Broadcast News

In this paper we present two methods for automatic threshold parameter estimation for an event tracking algorithm. We view the threshold as a statistic of the incoming data stream, which is assumed to contain broadcast news stories from radio, television, and newswire sources. Query bias deened in terms of threshold estima-tors can be identiied when a word co-occurrence representation for text ...

متن کامل

Measuring quality of similarity functions in approximate data matching

This paper presents a method for assessing the quality of similarity functions. The scenario taken into account is that of approximate data matching, in which it is necessary to determine whether two data instances represent the same real world object. Our method is based on the semi-automatic estimation of optimal threshold values. We propose two methods for performing such estimation. The fir...

متن کامل

Automatic Bounding Estimation in Modified Nlms Algorithm

Modified Normalized Least Mean Square (MNLMS) algorithm, which is a sign form of NLMS based on set-membership (SM) theory in the class of optimal bounding ellipsoid (OBE) algorithms, requires a priori knowledge of error bounds that is unknown in most applications. In a special but popular case of measurement noise, a simple algorithm has been proposed. With some simulation examples the performa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Inf. Sci.

دوره 181  شماره 

صفحات  -

تاریخ انتشار 2008